Clustering with Normalized Cuts is Clustering with a Hyperplane

نویسندگان

  • Ali Rahimi
  • Ben Recht
چکیده

We present a set of clustering algorithms that identify cluster boundaries by searching for a hyperplanar gap in unlabeled data sets. It turns out that the Normalized Cuts algorithm of Shi and Malik [1], originally presented as a graph-theoretic algorithm, can be interpreted as such an algorithm. Viewing Normalized Cuts under this light reveals that it pays more attention to points away from the center of the data set than those near the center of the data set. As a result, it can sometimes split long clusters and display sensitivity to outliers. We derive a variant of Normalized Cuts that assigns uniform weight to all points, eliminating the sensitivity to outliers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Feature Space View of Spectral Clustering

The transductive SVM is a semi-supervised learning algorithm that searches for a large margin hyperplane in feature space. By withholding the training labels and adding a constraint that favors balanced clusters, it can be turned into a clustering algorithm. The Normalized Cuts clustering algorithm of Shi and Malik, although originally presented as spectral relaxation of a graph cut problem, ca...

متن کامل

Normalized cuts clustering with prior knowledge and a pre-clustering stage

Clustering is of interest in cases when data are not labeled enough and a prior training stage is unfeasible. In particular, spectral clustering based on graph partitioning is of interest to solve problems with highly non-linearly separable classes. However, spectral methods, such as the well-known normalized cuts, involve the computation of eigenvectors that is a highly time-consuming task in ...

متن کامل

Automatically finding clusters in normalized cuts

Normalized Cuts is a state-of-the-art spectral method for clustering. By applying spectral techniques, the data becomes easier to cluster and then k-means is classically used. Unfortunately the number of clusters must be manually set and it is very sensitive to initialization. Moreover, k-means tends to split large clusters, to merge small clusters, and to favor convex-shaped clusters. In this ...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

Feature Selection Framework for White Matter Fiber Clustering Based on Normalized Cuts

Due to its ability to automatically identify spatially and functionally related white matter fiber bundles, fiber clustering has the potential to improve our understanding of white matter anatomy. The normalized cuts (NCut) criterion has proven to be a suitable method for clustering fiber tracts. In this work, we show that the NCut value can be used for unsupervised feature selection as a measu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004